Preprocessing for PPM: Compressing Utf-8 Encoded Natural Language Text
نویسندگان
چکیده
منابع مشابه
Encoded Natural Language Text
In this paper, several new universal preprocessing techniques are described to improve Prediction by Partial Matching (PPM) compression of UTF-8 encoded natural language text. These methods essentially adjust the alphabet in some manner (for example, by expanding or reducing it) prior to the compression algorithm then being applied to the amended text. Firstly, a simple bigraphs (two-byte) subs...
متن کاملPPMexe: PPM for Compressing Software
With the emergence of software delivery platforms such as Microsoft’s .NET, code compression has become one of the core enabling technologies strongly affecting system performance. In this paper, we present PPMexe a set of compression mechanisms for executables that explores their syntax and semantics to achieve superior compression rates. The fundament of PPMexe is the generic paradigm of pred...
متن کاملNatural Language Compression on Edge-Guided text preprocessing
This paper presents Edge-Guided (E-G), an optimized text preprocessing technique for compression purposes. It transforms the original text into a word net, which stores all relationships between adjoining words. A specific directed graph is proposed to model this transformation: words are stored in vertices, whereas edges represent word transitions. Thus, the word net has a text representation ...
متن کاملSemantic Information Preprocessing for Natural Language Interfaces to Databases
An approach is described for supplying se-lectional restrictions to parsers in natural language interfaces (NLIs) to databases by extracting the selectional restrictions from semantic descriptions of those NLIs. Automating the process of finding selectional restrictions reduces NLI development time and may avoid errors introduced by hand-coding selectional restrictions.
متن کاملDiscourse Strategies for Generating Natural-Language Text
If a generation system is to produce text in response to a given communicative goal, it must be able to determine what to include in its text and how to organize this information so that it can be easily understood. In this paper, a computational model of discourse strategies is presented that can be used to guide the generation process in its decisions about what to say next. The model is base...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Science and Information Technology
سال: 2015
ISSN: 0975-4660,0975-3826
DOI: 10.5121/ijcsit.2015.7204